Search CORE

20 research outputs found

Containment of Shape Expression Schemas for RDF

Author: Staworko Slawek
Wieczorek Piotr
Publication venue
Publication date: 26/03/2018
Field of study

We study the problem of containment for shape expression schemas (ShEx) for RDF graphs. We identify a subclass of ShEx that has a natural graphical representation in the form of shape graphs and their semantics is captured with a tractable notion of embedding of an RDF graph in a shape graph. When applied to pairs of shape graphs, an embedding is a sufficient condition for containment, and for a practical subclass of deterministic shape graphs, it is also a necessary one, thus yielding a subclass with tractable containment. While for general shape graphs a minimal counter-example i.e., an instance proving non-containment, might be of exponential size, we show that containment is EXP-hard and in coNEXP. Finally, we show that containment for arbitrary ShEx is coNEXP-hard and in coTwoNEXP^NP

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Management of Inconsistencies in Data Integration

Author: Ioannou Ekaterini
Staworko Slawek
Publication venue
Publication date: 01/01/2013
Field of study

Data integration aims at providing a unified view over data coming from various sources. One of the most challenging tasks for data integration is handling the inconsistencies that appear in the integrated data in an efficient and effective manner. In this chapter, we provide a survey on techniques introduced for handling inconsistencies in data integration, focusing on two groups. The first group contains techniques for computing consistent query answers, and includes mechanisms for the compact representation of repairs, query rewriting, and logic programs. The second group contains techniques focusing on the resolution of inconsistencies. This includes methodologies for computing similarity between atomic values as well as similarity between groups of data, collective techniques, scaling to large datasets, and dealing with uncertainty that is related to inconsistencies

CiteSeerX

Edinburgh Research Explorer

Dagstuhl Research Online Publication Server

RDF Graph Alignment with Bisimulation

Author: Buneman Peter
Staworko Slawek
Publication venue: 'VLDB Endowment'
Publication date: 28/06/2016
Field of study

We investigate the problem of aligning two RDF databases, an essential problem in understanding the evolution of ontologies. Our approaches address three fundamental challenges: 1) the use of "blank" (null) names, 2) ontology changes in which different names are used to identify the same entity, and 3) small changes in the data values as well as small changes in the graph structure of the RDF database. We propose approaches inspired by the classical notion of graph bisimulation and extend them to capture the natural metrics of edit distance on the data values and the graph structure. We evaluate our methods on three evolving curated data sets. Overall, our results show that the proposed methods perform well and are scalable

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Edinburgh Research Explorer

Hal-Diderot

Learning Schemas for Unordered XML

Author: Ciucanu Radu
Staworko Slawek
Publication venue
Publication date: 01/01/2013
Field of study

We consider unordered XML, where the relative order among siblings is ignored, and we investigate the problem of learning schemas from examples given by the user. We focus on the schema formalisms proposed in [10]: disjunctive multiplicity schemas (DMS) and its restriction, disjunction-free multiplicity schemas (MS). A learning algorithm takes as input a set of XML documents which must satisfy the schema (i.e., positive examples) and a set of XML documents which must not satisfy the schema (i.e., negative examples), and returns a schema consistent with the examples. We investigate a learning framework inspired by Gold [18], where a learning algorithm should be sound i.e., always return a schema consistent with the examples given by the user, and complete i.e., able to produce every schema with a sufficiently rich set of examples. Additionally, the algorithm should be efficient i.e., polynomial in the size of the input. We prove that the DMS are learnable from positive examples only, but they are not learnable when we also allow negative examples. Moreover, we show that the MS are learnable in the presence of positive examples only, and also in the presence of both positive and negative examples. Furthermore, for the learnable cases, the proposed learning algorithms return minimal schemas consistent with the examples.Comment: Proceedings of the 14th International Symposium on Database Programming Languages (DBPL 2013), August 30, 2013, Riva del Garda, Trento, Ital

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Characterizing XML Twig Queries with Examples

Author: Staworko Slawek
Wieczorek Piotr
Publication venue
Publication date: 01/01/2015
Field of study

International audienceTypically, a (Boolean) query is a finite formula that defines a possibly infinite set of database instances that satisfy it (positive examples), and implicitly, the set of instances that do not satisfy the query (negative examples). We investigate the following natural question: for a given class of queries, is it possible to characterize every query with a finite set of positive and negative examples that no other query is consistent with.We study this question for twig queries and XML databases. We show that while twig queries are characterizable, they generally require exponential sets of examples. Consequently, we focus on a practical subclass of anchored twig queries and show that not only are they characterizable but also with polynomially-sized sets of examples. This result is obtained with the use of generalization operations on twig queries, whose application to an anchored twig query yields a properly contained and minimally different query. Our results illustrate further interesting and strong connections between the structure and the semantics of anchored twig queries that the class of arbitrary twig queries does not enjoy. Finally, we show that the class of unions of twig queries is not characterizable

CiteSeerX

INRIA a CCSD electronic archive server

HAL Descartes

Edinburgh Research Explorer

Dagstuhl Research Online Publication Server

Hal-Diderot

Bounded repairability for regular tree languages

Author: Puppis Gabriele
Riveros Cristian
Staworko Slawek
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

International audienceWe consider the problem of repairing unranked trees (e.g., XML documents) satisfying a given restriction specification R (e.g., a DTD) into unranked trees satisfying a given target specification T. Specifically, we focus on the question of whether one can get from any tree in a regular language R to some tree in another regular language T with a finite, uniformly bounded, number of edit operations (i.e., deletions and insertions of nodes). We give effective characterizations of the pairs of specifications R and T for which such a uniform bound exists, and we study the complexity of the problem under different representations of the regular tree languages (e.g., non-deterministic stepwise automata, deterministic stepwise automata, DTDs). Finally, we point out some connections with the analogous problem for regular languages of words

HAL - Lille 3

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Interactive Inference of Join Queries

Author: Bonifati Angela
Ciucanu Radu
Staworko Slawek
Publication venue
Publication date: 01/01/2013
Field of study

We investigate the problem of inferring join queries from user interactions. The user is presented with a set of candidate tuples and is asked to label them as positive or negative depending on whether she would like the tuples as part of the join result or not. The goal is to quickly infer an arbitrary n-ary join predicate across two relations by keeping the number of user interactions as minimal as possible. We assume no prior knowledge of the integrity constraints between the involved relations. This kind of scenario occurs in several application settings, such as data integration, reverse engineering of database queries, and constraint inference. In such a scenario, the user has either a partial knowledge of the database schemas or the database instances are too big to be skimmed. We explore the search space by using a set of strategies that let us prune what we call “uninformative” tuples, and directly present to the user the informative ones i.e., those that lead to quickly find the goal query that the user has in mind. In this paper, we focus on the inference of joins with equality predicates and we show that for such joins deciding whether a tuple is uninformative can be done in polynomial time. Next, we propose several strategies for presenting tuples to the user in a given order that lets minimize the number of interactions. We show the efficiency and scalability of our approach through an experimental study on both benchmark and synthetic datasets. Finally, we prove that adding projection to our queries makes the problem intractable. 1

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Learning Join Queries from User Examples

Author: Bonifati Angela
Ciucanu Radu
Staworko Slawek
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

International audienceWe investigate the problem of learning join queries from user examples. The user is presented with a set of candidate tuples and is asked to label them as positive or negative examples, depending on whether or not she would like the tuples as part of the join result. The goal is to quickly infer an arbitrary n-ary join predicate across an arbitrary number m of relations while keeping the number of user interactions as minimal as possible. We assume no prior knowledge of the integrity constraints across the involved relations. Inferring the join predicate across multiple relations when the referential constraints are unknown may occur in several applications, such as data integration, reverse engineering of database queries, and schema inference. In such scenarios, the number of tuples involved in the join is typically large. We introduce a set of strategies that let us inspect the search space and aggressively prune what we call uninformative tuples, and we directly present to the user the informative ones that is, those that allow the user to quickly find the goal query she has in mind. In this article, we focus on the inference of joins with equality predicates and also allow disjunctive join predicates and projection in the queries. We precisely characterize the frontier between tractability and intractability for the following problems of interest in these settings: consistency checking, learnability, and deciding the informativeness of a tuple. Next, we propose several strategies for presenting tuples to the user in a given order that allows minimization of the number of interactions. We show the efficiency of our approach through an experimental study on both benchmark and synthetic datasets

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

HAL

Hal-Diderot

Simple Schemas for Unordered XML

Author: Boneva Iovka
Ciucanu Radu
Staworko Slawek
Publication venue
Publication date: 01/01/2013
Field of study

We consider unordered XML, where the relative order among siblings is ignored, and propose two simple yet practical schema formalisms: disjunctive multiplicity schemas (DMS), and its restriction, disjunction-free multiplicity schemas (MS). We investigate their computational properties and characterize the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, twig query satisfiability, implication, and containment in the presence of schema. Our research indicates that the proposed formalisms retain much of the expressiveness of DTDs without an increase in computational complexity. 1

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Interactive Join Query Inference with JIM

Author: Bonifati Angela
Ciucanu Radu
Staworko Slawek
Publication venue
Publication date: 01/01/2014
Field of study

National audienceSpecifying join predicates may become a cumbersome task in many situations e.g., when the relations to be joined come from disparate data sources, when the values of the attributes carry little or no knowledge of metadata, or simply when the user is unfamiliar with querying formalisms. Such task is recurrent in many traditional data management applications, such as data integration, constraint inference, and database denormalization, but it is also becoming pivotal in novel crowdsourcing applications. We present Jim (Join Inference Machine), a system for interactive join specification tasks, where the user infers an n-ary join predicate by selecting tuples that are part of the join result via Boolean membership queries. The user can label tuples as positive or negative, while the system allows to identify and gray out the uninformative tuples i.e., those that do not add any information to the final learning goal. The tool also guides the user to reach her join inference goal with a minimal number of interactions

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

Edinburgh Research Explorer